An E ective Hash-Based Algorithm for Mining Association Rules

نویسندگان

  • Jong Soo Park
  • Ming-Syan Chen
  • Philip S. Yu
چکیده

In this paper, we examine the issue of mining association rules among items in a large database of sales transactions. The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items which appear in a suucient number of transactions. The problem of discovering large itemsets can be solved by constructing a candidate set of itemsets rst and then, identifying, within this candidate set, those itemsets that meet the large itemset requirement. Generally this is done iteratively for each large k-itemset in increasing order of k where a large k-itemset is a large itemset with k items. To determine large itemsets from a huge number of candidate large itemsets in early iterations is usually the dominating factor for the overall data mining performance. To address this issue, we propose an eeective hash-based algorithm for the candidate set generation. Explicitly, the number of candidate 2-itemsets generated by the proposed algorithm is, in orders of magnitude, smaller than that by previous methods, thus resolving the performance bottleneck. Note that the generation of smaller candidate sets enables us to eeectively trim the transaction database size at a much earlier stage of the iterations, thereby reducing the computational cost for later iterations signiicantly. Extensive simulation study is conducted to evaluate performance of the proposed algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hash Based Parallel Algorithms for Mining Association Rules

In this paper, we propose four parallel algorithms (NPA, SPA, HPA and HPA-ELD) for mining association rules on shared-nothing parallel machines to improve its performance. In NPA, candidate itemsets are just copied amongst all the processors, which can lead to memory over ow for large transaction databases. The remaining three algorithms partition the candidate itemsets over the processors. If ...

متن کامل

Introducing an algorithm for use to hide sensitive association rules through perturb technique

Due to the rapid growth of data mining technology, obtaining private data on users through this technology becomes easier. Association Rules Mining is one of the data mining techniques to extract useful patterns in the form of association rules. One of the main problems in applying this technique on databases is the disclosure of sensitive data by endangering security and privacy. Hiding the as...

متن کامل

A new approach based on data envelopment analysis with double frontiers for ranking the discovered rules from data mining

Data envelopment analysis (DEA) is a relatively new data oriented approach to evaluate performance of a set of peer entities called decision-making units (DMUs) that convert multiple inputs into multiple outputs. Within a relative limited period, DEA has been converted into a strong quantitative and analytical tool to measure and evaluate performance. In an article written by Toloo et al. (2009...

متن کامل

An Incremental Mining Algorithm for Association Rules Based on Minimal Perfect Hashing and Pruning

In the literatures, hash-based association rule mining algorithms are more efficient than Apriori-based algorithms, since they employ hash functions to generate candidate itemsets efficiently. However, when the dataset is updated, the whole hash table needs to be reconstructed. In this paper, we propose an incremental mining algorithm based on minimal perfect hashing. In our algorithm, each can...

متن کامل

A New Ontology Based Association Rules Mining Algorithm

For traditional data mining techniques cannot be directly applied to the semi-structured XML data mining problem, this paper proposes a novel ontology and association rules based XML mining algorithm. The algorithm firstly introduces the domain ontology and hash technology to improve the operation of emerging frequent item sets and generating association rules, then uses a hash table to store t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995